147 research outputs found

    A new multidimensional model with text dimensions: definition and implementation

    Get PDF
    We present a new multidimensional model with textual dimensions based on a knowledge structure extracted from the texts, where any textual attribute in a database can be processed, and not only XML texts. This dimension allows to treat the textual data in the same way as the non-textual one in an automatic way, without user’s intervention, so all the classical operations in the multidimensional model can been defined for this textual dimension. While most of the models dealing with texts that can be found in the literature are not implemented, in this proposal, the multidimensional model and the OLAP system have been implemented in a software tool, so it can be tested on real data. A case study with medical data is included in this work.Junta de Andalucia P07-TIC02786 P10-TIC6109 P11-TIC746

    Financiación Internacional de la Investigación

    Get PDF
    Organizada por: Oficina de Proyectos Internacionales UGR, Welcome Center de la UGR, Agencia Andaluza del Conocimiento y Fundación Española para la Ciencia y la Tecnología (FECYT).Se muestra como se puede financiar la investigación a través de fondos internacionales

    Non-Query-Based Pattern Mining and Sentiment Analysis for Massive Microblogging Online Texts

    Get PDF
    Pattern mining has been widely studied in the last decade given its great interest for research and its numerous applications in the real world. In this paper the definition of query and non-query based systems is proposed, highlighting the needs of non-query based systems in the era of Big Data. For this, we propose a new approach of a non-query based system that combines association rules, generalized rules and sentiment analysis in order to catalogue and discover opinion patterns in the social network Twitter. Association rules have been previously applied for sentiment analysis, but in most cases, they are used once the process of sentiment analysis is finished to see which tokens appear commonly related to a certain sentiment. On the other hand, they have also been used to discover patterns between sentiments. Our work differs from these in that it proposes a non-query based system which combines both techniques, in a mixed proposal of sentiment analysis and association rules to discover patterns and sentiment patterns in microblogging texts. The obtained rules generalize and summarize the sentiments obtained from a group of tweets about any character, brand or product mentioned in them. To study the performance of the proposed system, an initial set of 1.7 million tweets have been employed to analyse the most salient sentiments during the American pre-election campaign. The analysis of the obtained results supports the capability of the system of obtaining association rules and patterns with great descriptive value in this use case. Parallelisms can be established in these patterns that match perfectly with real life events.COPKIT Project, through the European Union's Horizon 2020 Research and Innovation Programme 786687Spanish Ministry for Economy and Competitiveness TIN2015-64776-C3-1-RAndalusian Government, through Data Analysis in Medicine: from Medical Records to Big Data Project P18-RT-2947Spanish Ministry of Education, Culture, and Sport FPU18/00150University of Granad

    NOFACE: A new framework for irrelevant content filtering in social media according to credibility and expertise

    Get PDF
    Social networks have taken an irreplaceable role in our lives. They are used daily by millions of people to communicate and inform themselves. This success has also led to a lot of irrelevant content and even misinformation on social media. In this paper, we propose a user-centred framework to reduce the amount of irrelevant content in social networks to support further stages of data mining processes. The system also helps in the reduction of misinformation in social networks, since it selects credible and reputable users. The system is based on the belief that if a user is credible then their content will be credible. Our proposal uses word embeddings in a first stage, to create a set of interesting users according to their expertise. After that, in a later stage, it employs social network metrics to further narrow down the relevant users according to their credibility in the network. To validate the framework, it has been tested with two real Big Data problems on Twitter. One related to COVID-19 tweets and the other to last United States elections on 3rd November. Both are problems in which finding relevant content may be difficult due to the large amount of data published during the last years. The proposed framework, called NOFACE, reduces the number of irrelevant users posting about the topic, taking only those that have a higher credibility, and thus giving interesting information about the selected topic. This entails a reduction of irrelevant information, mitigating therefore the presence of misinformation on a posterior data mining method application, improving the obtained results, as it is illustrated in the mentioned two topics using clustering, association rules and LDA techniques.European Commission 786687Andalusian government FEDER operative program P18-RT-2947 B-TIC-145-UGR18University of Granada's internal plan PPJIB2021-04Spanish Government FPU18/0015

    Spark solutions for discovering fuzzy association rules in Big Data

    Get PDF
    The research reported in this paper was partially supported the COPKIT project from the 8th Programme Framework (H2020) research and innovation programme (grant agreement No 786687) and from the BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947.The high computational impact when mining fuzzy association rules grows significantly when managing very large data sets, triggering in many cases a memory overflow error and leading to the experiment failure without its conclusion. It is in these cases when the application of Big Data techniques can help to achieve the experiment completion. Therefore, in this paper several Spark algorithms are proposed to handle with massive fuzzy data and discover interesting association rules. For that, we based on a decomposition of interestingness measures in terms of α-cuts, and we experimentally demonstrate that it is sufficient to consider only 10equidistributed α-cuts in order to mine all significant fuzzy association rules. Additionally, all the proposals are compared and analysed in terms of efficiency and speed up, in several datasets, including a real dataset comprised of sensor measurements from an office building.COPKIT project from the 8th Programme Framework (H2020) research and innovation programme 786687BIGDATAMED projects B-TIC-145-UGR18 P18-RT-294

    New Spark solutions for distributed frequent itemset and association rule mining algorithms

    Get PDF
    Funding for open access publishing: Universidad de Gran- ada/CBUA. The research reported in this paper was partially sup- ported by the BIGDATAMED project, which has received funding from the Andalusian Government (Junta de Andalucı ́a) under grant agreement No P18-RT-1765, by Grants PID2021-123960OB-I00 and Grant TED2021-129402B-C21 funded by Ministerio de Ciencia e Innovacio ́n and, by ERDF A way of making Europe and by the European Union NextGenerationEU. In addition, this work has been partially supported by the Ministry of Universities through the EU- funded Margarita Salas programme NextGenerationEU. Funding for open access charge: Universidad de Granada/CBUAThe large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.Universidad de Granada/CBUAJunta de Andalucia P18-RT-1765Ministry of Science and Innovation, Spain (MICINN) Instituto de Salud Carlos III Spanish Government PID2021-123960OB-I00, TED2021-129402B-C21ERDF A way of making EuropeEuropean Union NextGenerationEUMinistry of Universities through the E

    Rules and fuzzy rules in text: concept, extraction and usage

    Get PDF
    Several concepts and techniques have been imported from other disciplines such as Machine Learning and Artificial Intelligence to the field of textual data. In this paper, we focus on the concept of rule and the management of uncertainty in text applications. The different structures considered for the construction of the rules, the extraction of the knowledge base and the applications and usage of these rules are detailed. We include a review of the most relevant works of the different types of rules based on their representation and their application to most of the common tasks of Information Retrieval such as categorization, indexing and classification

    A Word Embedding-Based Method for Unsupervised Adaptation of Cooking Recipes

    Get PDF
    Studying food recipes is indispensable to understand the science of cooking. An essential problem in food computing is the adaptation of recipes to user needs and preferences. The main difficulty when adapting recipes is in determining ingredients relations, which are compound and hard to interpret. Word embedding models can catch the semantics of food items in a recipe, helping to understand how ingredients are combined and substituted. In this work, we propose an unsupervised method for adapting ingredient recipes to user preferences. To learn food representations and relations, we create and apply a specific-domain word embedding model. In contrast to previous works, we not only use the list of ingredients to train the model but also the cooking instructions. We enrich the ingredient data by mapping them to a nutrition database to guide the adaptation and find ingredient substitutes. We performed three different kinds of recipe adaptation based on nutrition preferences, adapting to similar ingredients, and vegetarian and vegan diet restrictions. With a 95% of confidence, our method can obtain quality adapted recipes without a previous knowledge extraction on the recipe adaptation domain. Our results confirm the potential of using a specific-domain semantic model to tackle the recipe adaptation task.European Commission 816303University of Granad

    Evolutionary Approach for Building, Exploring and Recommending Complex Items With Application in Nutritional Interventions

    Get PDF
    Over the last few years, the ability of recommender systems to help us in different environments has been increasing. Several systems try to offer solutions in highly complex environments such as nutrition, housing, or traveling. In this paper, we present a recommendation system capable of using different input sources (data and knowledge-based) and producing a complex structured output. We have used an evolutionary approach to combine several unitary items within a flexible structure and have built an initial set of complex configurable items. Then, a content-based approach refines (in terms of preferences) these candidates to offer a final recommendation.We conclude with the application of this approach to the healthy diet recommendation problem, addressing its strengths in this domain.Over the last few years, the ability of recommender systems to help us in different environments has been increasing. Several systems try to offer solutions in highly complex environments such as nutrition, housing, or traveling. In this paper, we present a recommendation system capable of using different input sources (data and knowledge-based) and producing a complex structured output. We have used an evolutionary approach to combine several unitary items within a flexible structure and have built an initial set of complex configurable items. Then, a content-based approach refines (in terms of preferences) these candidates to offer a final recommendation.We conclude with the application of this approach to the healthy diet recommendation problem, addressing its strengths in this domainEuropean Union (Stance4Health) under Grant 816303Ministerio de Ciencia e Innovación under Grant PID2021-123960OB-I00MCIN (Ministerio de Ciencia e Innovación)/AEI (Agencia estatal de Investigacion)/10.13039/501100011033ERDF (European Regional Development Fund)A way of making Europe. And in part under Grant TED2021-129402B-C21 funded by MCIN (Ministerio de Ciencia e Innovación)/AEI (Agencia estatal de Investigacion)/10.13039/501100011033European Union NextGenerationEU/PRTR (Plan de Recuperación, Transformación y Resiliencia)‘Program of Information and Communication technologies’’ at the University of Granad

    A fuzzy-based medical system for pattern mining in a distributed environment: Application to diagnostic and co-morbidity

    Get PDF
    In this paper we have addressed the extraction of hidden knowledge from medical records using data mining techniques such as association rules in conjunction with fuzzy logic in a distributed environment. A significant challenge in this domain is that although there are a lot of studies devoted to analysing health data, very few focus on the understanding and interpretability of the data and the hidden patterns present within the data. A major challenge in this area is that many health data analysis studies have focussed on classification, prediction or knowledge extraction and end users find little interpretability or understanding of the results. This is due to the use of black-box algorithms or because the nature of the data is not represented correctly. This is why it is necessary to focus the analysis not only on knowledge extraction but also on the transformation and processing of the data to improve the modelling of the nature of the data. Techniques such as association rule mining and fuzzy logic help to improve the interpretability of the data and treat it with the inherent uncertainty of real-world data. To this end, we propose a system that automatically: a) pre-processes the database by transforming and adapting the data for the data mining process and enriching the data to generate more interesting patterns, b) performs the fuzzification of the medical database to represent and analyse real-world medical data with its inherent uncertainty, c) discovers interrelations and patterns amongst different features (diagnostic, hospital discharge, etc.), and d) visualizes the obtained results efficiently to facilitate the analysis and improve the interpretability of the information extracted. Our proposed system yields a significant increase in the compression and interpretability of medical data for end-users, allowing them to analyse the data correctly and make the right decisions. We present one practical case using two health-related datasets to demonstrate the feasibility of our proposal for real data.Junta de Andalucia P18-RT-1765Ministry of Universities through the E
    corecore